chiudere
chiudere
La tua rete di domani
La tua rete di domani
Pianifica il tuo percorso verso una rete più veloce, sicura e resiliente, progettata per le applicazioni e gli utenti che supporti.
Experience Netskope
Prova direttamente la piattaforma Netskope
Ecco la tua occasione per sperimentare in prima persona la piattaforma single-cloud di Netskope One. Iscriviti a laboratori pratici e a ritmo autonomo, unisciti a noi per dimostrazioni mensili di prodotti dal vivo, fai un test drive gratuito di Netskope Private Access o partecipa a workshop dal vivo guidati da istruttori.
Un leader in SSE. Ora è un leader nel settore SASE a singolo fornitore.
Netskope è riconosciuto come Leader Più Lontano in Visione sia per le piattaforme SSE che SASE
2 volte leader nel Quadrante Magico di Gartner® per piattaforme SASE
Una piattaforma unificata costruita per il tuo percorso
Securing Generative AI for Dummies
Securing Generative AI for Dummies
Scopri come la tua organizzazione può bilanciare il potenziale innovativo dell'AI generativa con pratiche solide di sicurezza dei dati.
eBook sulla Modern Data Loss Prevention (DLP) for Dummies
Modern Data Loss Prevention (DLP) for Dummies
Ricevi consigli e trucchi per passare a un DLP fornito dal cloud.
Modern SD-WAN for SASE Dummies Book
Modern SD-WAN for SASE Dummies
Smettila di inseguire la tua architettura di rete
Comprendere dove risiede il rischio
Advanced Analytics trasforma il modo in cui i team di operazioni di sicurezza applicano insight basati sui dati per implementare policy migliori. Con l'Advanced Analytics, puoi identificare tendenze, concentrarti sulle aree di interesse e utilizzare i dati per agire.
Supporto tecnico Netskope
Supporto tecnico Netskope
I nostri ingegneri di supporto qualificati sono dislocati in tutto il mondo e possiedono competenze diversificate in sicurezza cloud, networking, virtualizzazione, content delivery e sviluppo software, garantendo un'assistenza tecnica tempestiva e di qualità.
Video Netskope
Formazione Netskope
La formazione Netskope ti aiuterà a diventare un esperto di sicurezza cloud. Siamo qui per aiutarti a proteggere il tuo percorso di trasformazione digitale e a sfruttare al meglio le tue applicazioni cloud, web e private.

AI and Deep Learning At Work: How to Know If Your Images Are Storing Sensitive Information

Jun 02 2023

In today’s rapidly digitizing world, the importance of data security has become paramount. With the increasing amount of sensitive information being shared and stored online, securing information from cyber attacks, information breaches, and theft has become a top priority for companies of all sizes. Data loss prevention (DLP) is a critical part of the Netskope Intelligent Security Service Edge (SSE) security platform, providing best-in-class data security to our customers. 

Images often contain a wealth of valuable and sensitive data. Financial documents, personal identification, and confidential business communications frequently include images that require the utmost security. At Netskope, we have developed state-of-the-art deep learning-based computer vision classifiers that can analyze images and identify sensitive information in a wide variety of categories such as passports, drivers licenses, credit cards, and screenshots. We have been awarded four U.S. patents for our innovative approach to data security. In this blog post, we highlight recent improvements to our image classifiers that resulted in higher accuracy and better customer experience.  

CNN Architecture Update

At the heart of our image classification models lie convolutional neural networks (CNNs). These powerful deep learning algorithms are specifically designed for image recognition and classification tasks. By employing a technique known as transfer learning, we take advantage of pre-existing CNNs that have been trained on large-scale datasets and fine-tune them using a smaller dataset of labeled images that contain sensitive information. As a result, our classifiers are able to quickly identify the unique patterns associated with the sensitive information, with high accuracy and reduced training time. 

There are several practical concerns in selecting the pre-trained CNN models. Given that our classifiers are used to scan millions of customer files daily by our SSE platform, it is crucial to keep false positives as low as possible to avoid overwhelming customers with spurious alerts. Simultaneously, since true positives indicate a serious data leak, maintaining a high true positive rate is equally important. An additional challenge lies in creating classifiers complex enough to meet our accuracy goals yet compact enough to fulfill our stringent latency requirements, since they run in real time on the SSE platform. As such, we only considered pre-trained CNN model architectures with fewer than 10 Million parameters.

EfficientNet Architecture (https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html)

In our latest model update, we transitioned to EfficientNet pre-trained CNN architecture (modeled in the figure above). This lead to an 80% increase in the number of model parameters. Using a larger pre-trained model incurred a modest increase in latency but yielded a significant boost in real-world accuracy. 

Training on real cloud data

In order to minimize false positives, it is important for our image classifiers to be exposed to a wide variety of realistic negative samples. To achieve this, we have sourced tens of thousands of actual cloud images from our own corporate data. This approach enables us to collect a substantial number of genuine training images, while simultaneously maintaining our commitment to customer privacy. These images were labeled by hand, with the majority of them being either negative examples or screenshots typical of real-world cloud data. 

In addition to these random negative examples, we have also incorporated several thousand carefully curated adversarial samples, further bolstering our classifiers’ resilience against false positives. One interesting type of adversarial sample was labels for electronics. Due to their bold fonts and high contrast coloring, they can be mistaken for sensitive documents. By training our classifiers on these adversarial examples, we can effectively prevent such misclassifications in the production environment.

Custom data augmentations

Example of image augmentation. A training sample of a driver’s license is pasted on a realistic background, in this case a screenshot.

In addition to sourcing real cloud data, we employ a comprehensive suite of data augmentation techniques specifically designed for computer vision applications, such as rotation and cropping. What sets our approach apart is the customization of these augmentations to ensure maximum fidelity with the image data encountered in real cloud environments. One example is our custom augmentation that seamlessly integrates documents onto realistic backgrounds, such as a driver’s license pasted on a screenshot. This enables our classifiers to train on documents in a diverse range of settings, significantly boosting its versatility and performance on real-world data.

Summary

In our pursuit to develop cutting-edge AI security solutions, we continuously strive to refine our methodologies and data sources to build powerful, adaptive data security models capable of safeguarding the ever-evolving digital landscape.

To learn more about how Netskope helps customers protect their sensitive data everywhere across their entire enterprise, please visit Netskope Data Loss Prevention.  And to keep up with with what our AI Labs team is writing about, please visit our AI Labs blog page here.

author image
Jason Bryslawskyj
At Netskope, Jason has been developing computer vision models for data loss prevention and phishing detection.
At Netskope, Jason has been developing computer vision models for data loss prevention and phishing detection.
Connettiti con Netskope

Iscriviti al blog di Netskope

Iscriviti per ricevere ogni mese una panoramica degli ultimi contenuti di Netskope direttamente nella tua casella di posta.